





36.Y11 


Applications of Time-Shared Computers in a Statistics Curriculum 


M. Schatzoff 


IBM Cambridge Scientific Center Report 


International Business Machines Corporation 
Cambridge Scientific Center 
Cambridge, Massachusetts 


October, 1966 


36.Y11 

October, 1966 

Scientific Center Report 
Limited Distribution 


APPLICATIONS OF TIME- 
SHARED COMPUTERS IN 
A STATISTICS CURRICULUM 


M. Schatzoff 


International Business Machines 
Corporation 

Cambridge Scientific Center 

Cambridge, Massachusetts 


Abstract 


This paper describes the application of remote console computing 
in a graduate statistics seminar entitled "Machine Aided Statistical 
Modeling", which was offered at Harvard University by the author 
during the spring semester 1965-1966. Three different computing 
systems are discussed: COMB (Console Oriented Model Building), 
COSMOS (Console Oriented Statistical Matrix Operator System) 
and the Culler On-Line’ Computer. The first two systems, designed 
by the author, operate under the M.I.T. Compatible Time Sharing 
System. They are directed toward analysis of residual procedures 
and general linear hypothesis calculations, respectively. The third 
system, physically located at the University of California at 

Santa Barbara, is a general purpose on-line computing system 
featuring a small storage scope. 


Applications of all three systems for classroom demonstrations 
and student exercises are discussed and illustrated, 


Index Terms for the IBM Subject Index 
——— ee oct anc ex 


Teaching 

Statistics 

Computations 

Time -Sharing 
05-Computer Application 
16 -Mathematics 


LIMITED DISTRIBUTION NOTICE 


This report has been submitted for publication else- 
where and has been issued as a Technical Report for 
early dissemination of its contents, As a courtesy 
to the intended publisher, it should not be widely 


distributed until after the date of outside publication, 


II. 
III. 


Iv. 


TABLE OF CONTENTS 


INTRODUCTION. ...... Oe Cae 


RESIDUAL ANALYSIS... 2... sees. 


THE GENERAL 


LINEAR MODEL. . 


THE CULLER SYSTEM....... 


SUMMARY. . 


ACKNOWLEDGMENTS ....... 


REFERENCES. 


APPENDIX A: 


APPENDIX B: 


o 0 © © © © © © © © 8 


Example of Console 
COMB ....... 


Example of Console 
COSMOS ...... 


Session Using 


Session Using 


17 
19 
20 


a1 


I. INTRODUCTION 


The past decade has witnessed the widespread and ever growing 
use of electronic computers by universities, government, industry and 
private research facilities. Applications ranging from routine data pro- 
cessing to highly sophisticated modeling, and encompassing virtually every 
scientific discipline and type of business activity, have been encountered. 
However, despite the fact that advances in computer technology have been 
rapid and impressive, they have not always been accompanied by correspond- 
ing strides in many areas of computer application. With rare exception, the 
entire area of statistical computer application, which has been character- 
ized mainly by a proliferation of ''canned" programs, has been rather rou- 
tine and unimaginative. Moreover, because of the wide availability of 
such programs, the teaching of statistical computation and data analysis 
has been de-emphasized instead of being re-directed to take advantage of 
‘the computer. . 

We are just now at the threshold of a new era in computing, enter- 
ing the age of the time-shared computer. Briefly, a time-sharing system 
allows many users to access a large central computer simultaneously from 
remote terminals such as typewriters, keyboards or scopes. The individual 
user can typically enter data and instructions, compile, edit, load or execute 
programs, and obtain responses from the computer so rapidly that in effect 
he may feel as if he is the sole user of the computer. This nearly instan- 


taneous communication facility provided between the individual and the com- 





puter will doubtlessly lead to countless hitherto unimagined applications 

of great potential value. From the standpoint of the statistician engaged 
in data analysis, a remote console may be viewed as a powerful computa- 
tional tool, supplanting, and far exceeding the capabilities of conventional 
devices such as desk calculators, graph paper and tables of functions. To 
assure widespread use of computer consoles in this fashion, it is essential 
that software systems designed for such purposes be easy to learn and 
easy to use by the computer novice. In effect, the non-programmer statis- 
tician should be readily able to use the console as a computational tool for 
carrying out, on large bodies of data, virtually any calculations that he 
might be able to perform on small bodies of data using tools such as desk 
calculators, graph paper and tables of functions. 

One of the most important features of any digital computer is its ability 
to make decisions during the execution of a program by means of pro- 
grammed conditional branching instructions; a corresponding innovation 
of great importance in a time-shared computing environment is the facility 
of introducing human decision making within the program. That is, because 
of the rapid wsnearaghine communication facility afforded by a time-shared 
computing system, the user can direct the computer to produce a sequence 
of complex computations, receive the results of such computations (in 
either numerical or graphical form) immediately, and then decide, based 
on examination of such results, what computations he would like the computer 
to carry out next. Exploitation of problem areas characterized by the require- 
ment of this type of rapid man-machine interaction will produce valuable 


problem solving approaches not previously available. 


Having attempted to motivate the ensuing discussion, we shall proceed 


to describe the application of remote console computing in a graduate 
statistics seminar offered at Harvard University by the author in the spring 
semester 1965-1966. A brief description of the course, taken from the 
catalogue of the Harvard Graduate School of Arts and Sciences, follows: 
"Statistics 285, Machine Aided Statistical Modeling. 

Application of time- shared computing to the construction and testing of 
statistical models. Topics will include methods for analyzing residuals 
from a least squares fitting and assessing the validity of underlying assump- 
tions. Students will use remote consoles to generate and analyze data from 
a variety of mathematical models. 

Prerequisite: 

Statistics 139 (Analysis of Variance), Knowledge of programming is not 
required," 

Lectures were held in a room at the Harvard Computation Center equip- 
ped with a closed circuit television receiver which was used to display 
console operation televised from a nearby machine room, Two remotely 
accessed computer systems, the MIT Compatible Time-Sharing System 
(based on the IBM 7094) and the Culler On-Line Computer (based on the 
TRW 400) were used for classroom demonstrations and student projects. 

The principal sections of this paper deal with statistical applications of 

the first of these two systems in the aforementioned course, as demonstra- 
ted on-line by the author at the 1966 Joint Statistical Meetings in Los Angeles, 
August 16, 1966 (Schatzoff, (1966)). 

In Section 2, we describe the operation of COMB (Console Oriented Model 
Building), a statistical computing system designed to implement and study the 
analysis of residuals procedures of Anscombe and Tukey (1963). We then 


proceed in Section 3, to discuss a second statistical system, COSMOS (Console 


Oriented Statistical Matrix Operator System), which employs the basic 
operators defined by Beaton (1964) to tackle problems associated with the 
general linear model, In Section 4, we indicate briefly some of the appli- 
cations of the Culler system, which is used primarily to manipulate and 
display functions visually on an electronic storage scope, and conclude in 
Section 5 with some remarks relative to future implications, 

II], RESIDUAL ANALYSIS 

There has been considerable interest, in recent years, in methods for 
examining residuals from a least squares fitting, with a view to asses- 
sing the validity of the usual model assumptions. An excellent summary of 
a number of such procedures is provided by Anscombe and Tukey (1963). 

Use of a time-shared computer to implement the Anscombe-Tukey procedures 
has been described previously by the author (Schatzoff (1965); in this section, 
we shall describe a much later ver sion of this computational approach, by 
illustrating the use of the COMB system from a typewriter console. The 
name COMB is meant to imply also the ability to comb a set of data in a 
number of directions in order to ascertain what the data has to tell us. 

The COMB system was implemented initially under the MIT Compatible 
Time-Sharing System. In effect, when the user types commands at a type- 
writer console, the full computational power of an IBM 7094 computer is 
made available to him. In following the examples provided in the appendices, 
the reader should bear in mind that the typewriter console provides a two 
way communication channel. Messages typed by the user appear in lower 
case type, while output from the computer is in upper case. Thus, it is 
easy for the reader to interpret each of the examples as a two way dialogue 
between the statistician and the computer, 

Referring to Appendix A, the user initiates his session by typing the 


command "'r comb", which is interpreted by the computer as meaning 


"resume operation of a program named comb'', The computer replies 
with the message ''W 1020.7'' meaning ''Wait until I find your program 
called comb, (which is stored permanently on a magnetic disc file) and 
activate it (i.e.-bring it into core storage and begin execution). The time 
of day is now 1020.7.'' The computer then types the message "READY "a 
few seconds later, indicating that the COMB program has been activated 
and is ready for the user to proceed, 

It should be pointed out at this time that COMB is tailored to the resi- 
dual anaylsis procedures for a two way fixed effects analysis of variance 
model. Since the program was intended primarily as a pedagogical device, 
it provides facilities for generating data from such a model, with provi- 
sions whereby the user can specify any of a number of types of departures 
from the usual normal theory assumptions, The types of departures which 
may be specified are precisely those which the Anscombe- Tukey procedures 
are supposed to detect, so that the system provides a means for investiga- 
ting the behavior of these procedures. 

Returning to the example, the user types the command "start" which 
tells the computer to start operation of the program by allowing him either 
to enter or generate a set of data. The ensuing question and answer sess- 
ion is fairly self-explanatory, Depending on the responses given at each 
point by the user, he may type in a’set of data, use the data saved from 
his previous session at the console, or have the computer generate a set 
of data from a mathematical model specified by him, If the latter option 
is selected, the user must indicate the size of the experiment; additionally, 
he must either request the computer to generate numerical values of the 
parameters of the model, or as was the case in this example, type in these 


values himself. The model is of the form: 


-6- 


(1) “ne ee, ia + oy aie 


where the e.. are independent N(0,0"). 


jk 

In the example, an asterisk denotes multiplication, while a quote symbol 
denotes electronic erasure of the previous character. Thus, the last q 
characters typed on a line may be erased from the computer's memory by 
typing q quote marks. An entire line may be erased by means of the ques- 
tion mark, As evidenced by the series of questions, the user may specify 
either an additive model (G=0) or an interactive model (G#0), and may 
perturb the model by: 

1, Sampling each error term Ck from a normal distribution whose 

variance depends a level FP sie =E (Yi) The nature of 
the dependence is Tis = a exp (b Piz) and if this option is 
selected, the user receives instructions for selecting a and b, 
2. Sampling the error terms from a scale contaminated normal 
distribution p N(0,g*) + (1-p) N (0,k @), 
where the user may specify p and k, 
3.. Specifying a fixed number of outliers of the form 
a a laa i Sila WP a ol 
The locations of the outliers (i.e. - the indices i, j and k) 
are selected at random by the computer, 

When the data generation has been completed, the computer types the 
message 'READY" signifying that the analyst may proceed to issue commands, 
A list of available commands (or codes), together with their definitions, may 
be obtained by typing the command "list", as shown in the next section of 
the example. As indicated by this list of commands, one can print the data, 


fitted values, residuals and estimates of the mean and main effects, per- 


form any of a number of statistical analyses and tests, obtain specialized 


plots, make transformations on the observations or change the model itself. 
Further, he can at any time retrieve the original data, process a new set 
of data or use the computer as a desk calculator. Finally, the user can, 
by typing the command "enlite'' be "enlightenend" regarding the model from 
which the data were generated. In response to this command, the computer 
will type the values of all the parameters of the model, indicate the nature 
of the dependence of variance upon level of response if applicable, and 
identify the locations and values of any contaminated observations or out- 
liers, Thus, the instructor can use the system to generate the data to be 
analyzed by students, and they in turn can be "enlightened", at the conclu- 
sion of their analyses, as to the "true state of nature". 

The user may issue one or more commands at a time, in any sequence 
whatever. Thus, turning to Section A.3, we see that the four commands 
"data res fit est'' may be typed on a single line, and the computer will 
print the data matrix, residuals, fitted values and estimates of the mean 
and main effects before indicating that it is 'READY'"' for further instruc- 
tions. In Section A.4, the command ''shape" calculates estimates of the 
third and fourth standardized moments (Gl and G2) together with their stan- 
dard errors and corresponding t values for testing normality, as per 
Anscombe (1961). This is followed by Tukey's (1949) one degree of freedom 
test for nonadditivity, where K is the regression coefficient of the residuals 
on the squared fitted values, F is the F-ratio for the one degree of freedom 
test, and P is a suggested power transformation for removing nonadditi- 
vity. It should be noted that the data generation phase of the program per- 
mits generation of a nonadditivity term of the form assumed by Tukey, and 


that the list of commands includes the facility for making power transforma- 


tions. Thus, it is easy to learn, on an empirical basis, how well the test 
and the transformation work. The third command in the sequence,''errdep", 
produces calculations associated with Anscombe's (1961) test of error vari- 
ance dependent on level. Here, H is a linear regression coefficient of the 
squared residuals on the fitted values (Y 55) T is an approximate t-statistic 
for testing the null hypothesis H,: H=0, and P is a suggested power trans- 
formation for removing the type of dependence assumed by Anscombe, Note 
again that the program permits generation of data from just such a model, 
Finally, an analysis of variance table can be produced by typing the command 
"anova ", and the observed significance level corre sponding to any observed 
F-ratio can be computed with the command ''fdist"'. The example is con- 
cluded in Section A, 5 with a full normal plot (Tukey (1962)) of the residuals, 
and "enlightenment", The cell index corresponding to each residual is indi- 
cated at the left hand margin of the full normal plot, which is scaled at the 
top in standard deviation units of the quantities being plotted. Thus, the 
square root of the mean square for error is indicated on the top of the plot 
as S=1,204, Directly below it is the estimate of & obtained by averaging the 
points on the full normal plot. The middle one-third of the points are supp- 
ressed from the plot, as suggested by Tukey, since they have a large vari- 
ance, Furthemore, most types of departures from the standard as sumptions, 
such as skewness, heavy tails, and outliers, will be evidenced in the tails. 
Samples from a normal distribution should plot in a vertical straight line, 
centered at the standard deviation of the distribution. 

In the example, we have not used all of the fecilitie s provided 
in the program. However, it should be clear that one could continue 
to perform tests, make transformations, and change model assumptions in 
any sequence whatever. The various tests and plots are designed to reveal 


particular aspects of the data, and the analyst can ponder the results at each 





~9:- 


stage and decide what to do next. The command structure is extremely 
simple, so that the user does not have to know anything at all about com- 
puters or computer programming. In the course, lecture sessions cov- 
ered the theory underlying the various residual analysis procedures in- 
corporated in the program, and the closed circuit television facility was 
used to demonstrate the application of these procedures, Students were 
provided access to the consoles so that they could experiment with these 
and other techniques, andin effect gain a great deal of data analysis exp- 
erience in a very short period of time. On occasion, data was generated 
by the instructor and given to the students to analyze, without providing 
the "enlightenment" facility. Classroom discussion of the analyses and 
comparison of results usually proved interesting and illuminating. 
If. THE GENERAL LINEAR MODEL 

Statistical analyses associated with the general linear model typically 
involve but a few basic types of matrix operations, such as linear trans- 
formations, solution of simultaneous linear equations, successive ortho- 
gonalization of variables and eigenvalue - eigenvector analysis. Beaton 
(1964) in a doctoral dissertation submitted to the Harvard Graduate School 
of Education, illustrated the use of six basic matrix operations to carry 
out standard calculations required in correlation analysis, regression 
analysis, analysis of variance and covariance, and the usual collection of 
multivariate procedures such as multivariate regression and analysis of 
variance, discriminant analysis, principal component analysis and canonical 
correlation analysis, It is pedagogically attractive for students to perform 
these kinds of analyses in terms of matrix operations since they would have 
learned the underlying theory in terms of matrices, Such an approach ob- 


viates the necessity of horrendous desk calculations yet preserves the 





spirit of requiring the user to understand what he is doing, rather than 
permitting him to use canned programs, which may not possess the flexi- 
bility to provide what he really wants. 

In this section, we illustrate the operation of an on-line statistical com- 
puting system called COSMOS (Console Oriented Statistical Matrix Operator 
System), which includes all of the Beaton operators, and has a simple com- 
mand structure. Data may be entered from an IBM 1050 typewriter console, 
and commands may be executed one at a time, by typing one command per 
line, or sequentially, by typing several commands ona line, In general, 
the matrix result of any operation may be used as an input matrix to the 
subsequent operation, and intermediate results may be printed or saved 
under matrix names provided by the user for later reference. Commands 
may be executed in any sequence specified by the user, so that the system 
provides great flexibility in allowing the user to tailor the analysis to the 
particular situation. Finally, there is a "macro" facility which enables the 
user to create new commands consisting of specified sequences of system 
commands and other "macro" commands. Thus, the basic commands in 
the system can be used to create higher level commands and "canned" 
programs in avery simple manner. 

A command in the COSMOS system consists of the name of an operator, 
followed by the name of a matrix (or matrices) upon which the operator is 
to act, and a list (or the name of a list) which refers to the range of the 
operation within the matrix. Omission of the list implies operation on all 
rows and columns of the matrix. For example, the command 
(3.1) prm data « 
means print the matrix called data, whereas the command 


(3. 2) prm data 12,248 , 





-ll - 


means print the matrix defined by the intersection of rows 1 and 2 with 
columns 2 4 and 8 of the matrix called data, The same result could have 
been achieved by typing the command sequence 

(3. 3) stl dick 1 2 3 + stl jane 2 4 8: prm datadick, jane . 
where the command stl dick 1 2 . means store a list called dick con- 
taining the elements 1 2, 

Since it is frequently desirable to use the result of an operation as in- 
put to the subsequent operation, omission of a matrix name in the command 
implies that the previous matrix result, referred to by the COSMOS system 
as WKMI (work matrix 1), will be used as the input matrix. For example, to 
print the result of the previous matrix operation, one simply types the com- 
mand 
(3.4) prm + 

Having introduced these few preliminaries concerning the system usage 
conventions, we will proceed to define three of the six basic matrix opera- 
tors and illustrate their use in the system to carry out stepwise correlation 
and regression analyses, 


Suppose we have a data matrix D,,. consisting of n observations on 


‘P 
p variables. The command 


(3. 5) scp d listr , liste > 

means calculate the sum of cross products matrix D*' D*, where the sub- 
matrix D* is defined by the intersection of the rows of d specified by 
listr andthe columns specified by listc. If the lists are omitted, the 
command computes D'D, (It should be noted that the matrix 'D'' is typed 
"d'" since the user is restricted to lower case.) Suppose now that we have 


a square matrix Apxp? which might have resulted from having executed 


an SCP command, The command 





-12 - 


(3. 6) swp ak 
with k a single element list, means sweep the matrix A onthe kth pivotal 


element. It will produce a matrix B defined by 
1 


(3.7) Ba oe 
Buf yl Ae eg eh 
By = Ay / Ata i 7k 
B, = Ais - Ay AL / Ay ist # ks 


The use of swp with a multiple element list has the effect of sweeping the 
matrix on the pivotal element specified by the first element of the list, 
then sweeping the resulting matrix on the pivotal element specified by the 
second element of the list etc. Thus, the command 

(3, 8) — a? 8 «i € * 

is equivalent to the command sequence 

(3.9) swp ar. swp S. ... Swpt,. 

It is easy to check from (3,7) that the swp operation is commutative and 
reversible, That is, application of the sweep operator with a given list 
is equivalent to sweeping with any permutation of the elements of that list, 
and sweeping twice on a given pivotal element is equivalent to not having 
swept on that element at all, Finally if we consider the matrix A partitioned 


along the first m rows and = Sper 


Any ye Ns 
(3.10) ae beets ise ae cis 
Pp 


and issue the command 
(3.11) swp al 2...m,. 


we obtain the resulting matrix 





1 
-1 : -1 2 
' 
ait : Ay, Ai m 
(12) PTR [pee eee = “ml 
i ae. «1 : 
Aa Ary | Aza~Aai Ari “iz ‘ 
t 


The third operator which we shall employ is called standardize, and 


abbreviated std. The command 


(3.13) std a fF & «an te 
transforms the matrix A into a matrix B with elements defined by 
1/2 
14 FA, ve 
(3.14) By =A; (A AY) 


for all i, j belonging to the list. 

Appendix B contains an example of a console session using the COSMOS 
system and the same set of data used by Beaton (1964), who borrowed the 
example from Walker and Lev (1953). The data consists of observations on 


six variables for each of 98 students. The six variables are: 


xX, Reading Score 
xX, Artificial Language Score 
xX, Arithmetic Score 


X4 Mid-Term Test Score 
5 Final Examination Score 

Xe Semester Grade . 

Additionally, a dummy variable having the value 1 for all observations 
was appended to the data matrix for purposes of convenience, which will be- 
come apparent shortly. We start the analysis at the point where the data 
has already been entered into the computer by means of the command stm 
(store a matrix) and filed permanently under the name "data". 


Referring to Appendix B, the first sequence of commands computes and 


- 14 - 


prints the sum of cross products matrix for the six variables and the 


dummy variable. This matrix is of the form 


(3.15) ys y*2 ee YX 6 


2 
2 yi*2 rarity Y*2%6 


a a hs L* 6 
pha y*2 eens * 6 


where the summations are taken over the n observations, and in the 





example, n= 98. 

The next sequence of commands sweeps the sum of cross products matrix 
on the 7th pivotal element, prints the result, and stores three lists repre- 
senting all the variables (X); Sins X6), the independent variables (X, oaey X3) 
and the dependent variables (X4, -.+,X¢). The function of the dummy vari- 
able (X) is laid bare by the command swp 7: which transforms the sum of 
cross products matrix (3-15) to a mean-centered cross products matrix, 
Thus, one can check directly from the definition of swp (3.7) that the (i,j) 
element of the resulting matrix will be given by Dy (Di My) / n 3 

The third sequence of commands specifies that the mean-centered cross 
products matrix resulting from the previous sequence of commands is to be 
saved under the name "cov", and that all the variables are to be "standar- 
dized'' and printed, As the reader may check from (3.14), the result of this 
operation is the matrix of simple correlation coefficients for the six vari- 


ables. 


Next, we see from (3.12) that sweeping the mean-centered cross products 





-15 - 


matrix ("cov") on the independent variables (1 2 3) and printing the seventh 


row and column first, results in the matrix 


7 
1 
a] 2 
(X'X) X'Y 3 
(3.16) 4 
Ay : 
Y'y - Y'X(X'X) X'y 6 
%3 sy Free X16 
ee X23 X24 oe X26 
where X = 5 
. Xn3 n4 oe *n6 





It is clear from the least squares theory that the columns of the upper right 


hand sub-matrix contain the least squares estimates of the regression co- 


4’ 5 es 
matrix is (n-4) S, a where S, l is the residual covariance matrix of the 


efficients of X,, X, and Xe on X,, X,,and Xs while the lower right hand sub- 


dependent variables (Ky, X_ and X¢) on the independent variables (X)> a and 


5 
X,). Standardizing the latter submatrix therefore produces the matrix of 
partial correlation coefficients of the dependent variables, after removing the 
effects of the independent variables. 

The next section of the example, starting with the command swp cov l., 
shows the steps required to obtain all seven possible regressions of the de- 
pendent variables on the independent variables in exactly seven sweeps. It 
can in fact be readily shown that all Ps - l regressions of a set of dependent 


variables on a set of k independent variables can always be obtained with 


2 1 sweeps of the mean-centered sum of cross-products matrix, 


Next, the use of the macro instruction to create a "canned" regression 
program is illustrated. The sequence of commands required for calculat- 
ing and printing correlation coefficients, least squares estimates of re- 
gression coefficients and residual sums of cross-products, and partial 
correlation coefficients, are typed after the word "macro" and the name 
"regr''is assigned to the sequence, The commands ans not carried out 
at this time, but rather, the command "regr'', which consists of the 
given sequence of commands, is added to the list of COSMOS commands, 
Now, by merely typing the command "regr'' the initial regression and 
partial correlation analysis can be carried out. Any number of such 
macros can be created by the user, and a macro command such as 
"regr'' can itself be part of another macro, In this manner, the user 
can create quite intricate programs with relative ease, On typing the 
command "quit'', the computer types the user dictionary, which contains 
the names of all matrices, lists and macros currently defined in the 
system. The user is then provided with instructions for saving such 
information if he.so desires, and the computer types "'R"! (meaning 
"ready'' for further commands) and provides an indication of the computer 
time, in seconds, required for operating the program (40.716) and for 
swapping the program in and out of core storage to accomodate other 
users (14,550), Finally, the user logs out, physically disconnecting his 
terminal from the computer, 

It should be remembered that we have illustrated but a few of the 
commands available in the COSMOS system. As indicated earlier, the 
system includes other commands which greatly facilitate stepwise multi- 
variate analyses. These basic operators, when used in conjunction with 


operators designed to allow a wide variety of data manipulation and 


selection options, transformations, sampling procedures and indexing 
and branching features, will permit very flexible, tailored analyses of 
highly complex bodies of data with relative ease. By using the COSMOS 
system, which is very easy to learn, the student can obtain valuable ex- 
perience in analyzing real data, just as fast as such data can be 
provided for his use, Accordingly, one of the aims of the COSMOS 
ay stem is to provide a data bank consisting of real data from a variety 
of application fields. It is hoped that widespread availability of 
COSMOS and appropriate computing facilities will encourage statisticians 
to analyze data themselves at computer consoles, and to experiment 
with new techniques of data analysis. 
IV, THE CULLER SYSTEM 

The Culler on-line computer, physically located at the University of 
California at Santa Barbara, is operated from specially designed remote 
consoles which present the user with a keyboard for entering data and 
instructions and a small electronic storage scope for displaying instruc- 
tions, data and functions. Each button on the keyboard performs a 
specific function such as add, multiply, sin, log, display, sort, etc. 
There are a number of different levels of operation within the system, 
so that if the system is in the vector level for example, as specified 
by pushing the appropriate vector usage level button, the add button signi- 
fies addition of vectors, whereas if the system is ina scalar level, one 
may operate with scalars, including individual components of vectors or 
matrices. Other levels are provided for operating with real and complex 
arrays, special functions, and user-defined functions. 

Because the system is designed to operate with "one-button pushes", 


it is very easy to use. For instance, one can construct and display 


- 18 - 


density and distribution functions quite readily, draw samples from these 
distributions and display the cumulative sample functions on appropriate 
scales such as normal probability, full normal, half-normal, or any 
other desired scale with but a few button pushes, There is also a macro 
facility provided in the system, so that the user can construct and store 
subprograms which themselves may be activated by depressing a single 
button, Subprogram commands may themselves be included as com- 
mands in other subprograms so that fairly complex programs can be 
created and stored in the computer. 

Classroom utilization of the Culler system was primarily devoted to 
the types of operations described above. Density and distribution func- 
tions for normal, chi-square and contaminated normal distributions 
were constructed and displayed on the scope. We could then draw re- 
peated samples from any of these distributions and plot the cumulative 
sample functions on arithmetic, normal or full normal scales, Outliers 
could be introduced into the samples, and the cumulative sample functions 
could be displayed on any of the indicated scales before and after techniques 
such as Winsorization, trimming, rejection of outliers, or other types 
of transformations had been employed, Utilization of the Culler system 
in this manner provided considerable insight into the operation of the 
indicated techniques as well as lively classroom discussion, which 
would frequently lead to suggestions which could be implemented spon- 
taneously by pushing a few buttons. Such sessions were generally in- 
formative and enjoyable. 

As with the MIT Compatible Time Sharing System and the two statisti- 
cal systems described in sections 2 and 3, students had access to the 


Culler System, and everyone managed to get a fair amount of hands-on 


=) ]1'92 X= 


time on one or the other (or both) of the systems. -Each student was re- 
quired to present a paper to the class, describing a project undertaken 
on one of the two computers. 
V. SUMMARY 

The use of time-shared computers as an aid in teaching techniques 
of statistical model building and data analysis at Harvard University has 
been described in some detail. We have been fortunate in having termin- 
als for the two computer systems available at Harvard so that a eat se 
of the nature described here could be undertaken, Within the next few 
years, many universities throughout the country will enjoy the availability 
of similar computational facilities; it is hoped that this revolutionary 
type of computing service will lead to a corresponding revolution in the 
use of computers by statisticians and other scientists engaged in research 
or teaching. 

The course described in this paper was of necessity experimental 
in nature, and as with,any other course offered for the first time, it is 
bound to undergo many changes the next time it is given. However, I 
am sure that the undertaking constituted a step in the right direction, 
and that similar applications of computers will be explored on a wider 
basis in the coming years, The prudent integration of on-line computing 
into a variety of statistics courses, through classroom use and homework 
exercises, would, I believe, constitute an important advance in the 
teaching and learning of statistical methods. 

In assessing the value of the course, it is felt that a number of 
worthwhile objectives were achieved. First, students were able to better 
understand the particular methods studied by using these methods to 


analyze data and observing their behavior under a variety of model as- 


sumptions. Experience in data analysis is itself important in the train- 2 
ing of any statistician, whether that training be oriented to theory or 
practice, for one develops insight into data analysis problems very 
rapidly in the course of working with data, A fringe benefit was that of 
introducing some of the students to computers for the first time; it is 
perhaps not unreasonable to require that some sort of computer training 
should be mandatory for every statistics student. If such training is 
oriented to statistical applications, it is perhaps more palatable than a 
general purpose computer course. Finally, it is my hope (and belief) 
that some of the students will have been sufficiently motivated to develop 
their own approaches to the intelligent use of computer resources in 
research, teaching and data analysis, for a revolution in computing is 


truly upon us, and statisticians should be ready to meet the challenge. 


ACKNOWLEDGMENTS: 
I wish to express my gratitude to Professor Fredrick Mosteller 
for allowing me to experiment with his students in the manner indicated, 
and to Professor Anthony Cettinger for having the foresight to have 
CTSS and Culler terminals installed at the Harvard Computation Labora- 
tory, along withdosed circuit television facilities to enhance their use 
inthe classroom, Dr. William Bossert provided valuable assistance in 
the use of the Culler System and preparation of classroom demonstrations, 
Finally, it is with pleasure that I acknowledge the invaluable contribu- 
tions of several staff members of the IBM Cambridge Scientific Center 
in the design and programming of the COMB and COSMOS systems. 


Foremost among these are Mr. Thomas Burhoe and Dr. Arthur Anger. 


REFERENCES: 


Anscombe, F. J., "Rejection of Outliers", Technometrics, 2 (1960), 
123-147, 


Anscombe, F. J., "Examination of Residuals", Proceedings of the 
Fourth Berkeley Symposium on Mathematical Statistics and Probability" 
(1961), (University of California Press), 1, 1-36. 


Anscombe, F. J. and Tukey, J. W., 'The Examination and Analysis 
of Residuals", Technometrics, 5 (1963), 141-160. 


Beaton, Albert E., ''The Use of Special Matrix Operators in Statistical 
Calculus", Research Bulletin RB-64-51, (Educational Testing Service) 
Princeton, New Jersey, (1964). 


Schatzoff, M., "Console Oriented Model Building", Proceedings of the 
20th National Conference of the Association for Computing Machinery, 
(1965), 354-374, 


Schatzoff, M., 'Uses of Computers ina Statistics Curriculum: An 
On-Liine Demonstration", Abstracts Booklet, Summaries of Papers 
Presented at the Joint Statistical Meetings of the American Statistical 
Association, Biometric Society (Eastern and Western North American 
Regions), Institute of Mathematical Statistics (Western Regional 

Meeting) and Western Farm Economics Association (American Statistical 
Association), 1966. 


Tukey, J. W., "One Degree of Freedom for Nonadditivity'’ Biometrics, 
5 (1949), 232-242, 


Tukey, J. W., ''The Future of Data Analysis", Annals of Mathematical 
Statistics, 33, (1962), 1-67. 


Walker, Helen M., and Lev, Joseph, Statistical Inference, Henry Holt 
and Company, New York, 1953. 


APPENDIX A 


Example of Console Session Using COMB 


r comb 

W 1020.7 A.l 
READY 

start 

DO YOU WISH TO INPUT YOUR OWN DATA, YES OR NO. 


no 
DO YOU WISH TO USE DATA FROM YOUR LAST SESSION. 
no 

INPUT NUMBER OF ROWS, COLUMNS AND REPLICATIONS. 
6,6,1 
WANT RANDOM GENERATION OF PARAMETERS 
no . 
TYPE MU AND G, WHERE MU IS THE GRAND MEAN AND THE (I,J) INTERACTION TERM IS 
G*ALPHA( 1) *BETACJ). 
10,0 

INPUT ROW MAIN EFFECTS. 
a EV al ear Ie ee ae 

INPUT COLUMN MAIN EFFECTS. 
=1,-.4,0,.1,.5, +8 

WANT ERROR VARIANCE TO DEPEND ON LEVEL 
no 

INPUT SIGMA SQUARED. 


1 

WANT CONTAMINATED ERRORS 

yes 

TYPE SIGMA MULTIPLIER AND PRORARILTY. 
5,.1 

WANT OUTLIERS 


no 
REANY 


list 


LUST 
DATA 
EST 
FIT 
RES 
OUTL 
ANOVA 
SHAPE 
NONADD 
FDIST 


ERRDEP 
FUNOP 
PLOTRF 
LOG 
ASINY 
POWER 
ADDK 
MPYK 
CHGY 
CHGMOD 
RECOUP 
START 
DESCAL 
CODES 
QUIT 
READY 


DESCRIPTION OF CODES 

PRINT DATA MATRIX 

PRINT ESTIMATES OF MEAN AND MAIN EFFECTS 

PRINT FITTED VALUES 

PRINT RESIDUALS 

OUTLIER TEST, USER MUST SPECIFY PREMIUM 

ANALYSIS OF VARIANCE TABLE 

THIRD AND FOURTH MOMENTS OF RESIDUALS, TOGETHER WITH T 
ONE DEGREE OF FREEDOM TEST FOR NON-ADDITIVITY 


UPPER TAIL PROBABILITY OF F DISTRIBUTION, USER MUST SPECIFY 
F STATISTIC AND DEGREES OF FREEDOM 


DEPENDENCE OF VARIABILITY UPON LEVEL OF RESPONSE 

FUNOP PLOT 

PLOT OF RESIDUALS VS. FITTED VALUES 

LOGARITHMIC TRANSFORMATION 

ARCSINE TRANSFORMATION 

POWER TRANSFORMATION, USER MUST SPFCIFY EXPONENT 

ADD A CONSTANT TO EACH OBSERVATION 

MULTIPLY EACH OBSERVATION BY A CONSTANT 

MODIFY INDIVIDUAL OBSERVATIONS 

CHANGE THE MODEL FROM ADDITIVE TO INTERACTIVE OR VICE VERSA 
RETRIEVE THE ORIGINAL DATA 

PROCESS A NEW SET OF DATA, PREVIOUS DATA CANNOT BE RETRIEVED 
DESK CALCULATOR 

LIST OF CODES 

TERMINATE SESSIOM 


data res fit est 


DATA MATRIX 
6.392 6.627 
5.671 8.762 
6.826 8.509 
8.474 9.380 

10.869 11.849 
10.623 10.673 
RESIDUALS 
0524 -.399 
-1.242 692 
-1.934 -1.408 
0373 122 
1.233 1.055 
1.046 -.062 


FITTED VALUES 


5.868 
6.912 
8.759 
8.101 
9.636 
9.577 


7.026 
8.070 
9.917 
9.259 
10.794 
10.735 


ESTIMATED MEAN 


9.7905 


6.921 
8.459 
9.644 
10.206 
10.231 
10.470 


-.127 
«367 
-.295 
-926 
-.584 
~.286 


7.048 
8.092 
9.939 
9.281 
10.816 
10.757 


9.470 
8.955 
13.235 
11,174 
11.049 
9.914 


1.111 
-.4b8 
1.985 
582 
-~1.078 
2.153 


8.358 
9.403 
11.250 
10.591 
12,126 
12.067 


ESTIMATED ROW MAIN EFFECTS 


-2.27K4 
-1,2300 
6171 
-.0413 
1.4938 
1.4348 


ESTIMATED COLUMN MAIN EFFECTS 


-1.6482 
-.4904 
-. 4685 

«8421 
4685 
1.2965 


REANY 


A. 


7.907 
9.419 
11.333 
7.913 
12.286 


12.696° 


-.077 
-390 
0457 

“2.305 
533 
1.002 


7.985 
9.029 
10.876 
10.218 
11.753 
11.694 


3 


7.780 
10.097 
12.899 
11.348 
11.422 
12.976 


-1,032 
240 
1.195 
302 
=Ls159 
454 


8.813 
9.857 
11.704 
11.046 
12.581 
12.522 


shape nonadd errdep anova 


G1 = 


STANDARD ERROR OF G1 
STANDARD ERROR OF G2 


-1.0494, G2 


Tl = “1.5984, T2 
K = ~.0376 
Fos 2492 
Pos 1.7371 
H VAR(H) 
+2192 0324 
SOURCE Ss 
ROW 68.1504 
COLUMN 34.7158 
ERROR 36.2534 
TOTAL 139.1196 
REANY 
fdist 


INPUT F, DF1, DF2 
4.7879,5,25 


F 


DF1l DF2 


4.7879 5 25 


INPUT F, 


INT. 0 
READY 


DF1, DF2 


-.0447 
«6565 
1.4682 

-.0305 


T . Pp 


1.2177 -.0731 


D.F. MS 


5 13.6301 
5 6.9432 
25 1.4501 
35 


ALPHA 
0033 


9.3992 
4.7879 


funop 


RRARRARARAA AAA AAR RAR AR ARR RAR RAR AAR RRR ARR ARAR A 
- 
x. 
a 


MU 


10.000 


SIGMA = 


ROW MAIN EFFECTS 
-2.500 


GAMMA 


0. 


1,000 


-1.500 


-.107 


-.500 «500 


COLUMN MAIN EFFECTS 
-1.000- 


‘-.400 


0. - 100 


CONTAMINATED ERRORS 


1, 
2, 
3, 
3, 
6, 


HAS 
HAS 
HAS 
HAS 
HAS 


BEEN CHANGED 
BEEN CHANGED 
BEEN CHANGED 
BEEN CHANGED 
BEEN CHANGED 


1 


FROM 
FROM 
FROM 
FROM 
FROM 


500 2.500 


«500 800 


6.841 TO 
8.810 TO 
7.766 TO 
9.486 TO 
12.593 TO 


9.470 
8.955 
13.235 
12.899 
9.914 


APPENDIX B 


Example of Console Session Using COSMOS 


r cosmos 
WoT018.1 


COSMOS REANY FOR INPUT 


scp data . prn 
KML 
13,0293 
16.4832 
11.7253 
18,5775 
17,4057 
18,9876 
34,9500 


$ 
swno 7 .prem, 
VIKM I 
¥5650 
2564 
2588 
2430 
12267 
2346 
3566 


$ 

save cov , std 
WKMIT 3 

1,0000 

ss 

4824 

23536 

23250 

3686 


$ 
swn cov ind, 
WKMI 

«a 558 

=, 4313 

=.2275 

-,2109 

=,2089 

-.2078 

-.2098 


WKMIT 
1.0000 

«4739 

+8313 


16.4832 
22,2940 
14,6929 
24,4822 
22,872 
23.8069 
45,5000 


stl all 1 2 


2564 
1.1690 
74161 
«6133 
5096 
+5648 
4643 


11.2253 18,5775 
14,6929 24,4822 


19,1581 16,4694 
16.4694 27,8051 
15,4468 25,8375 


16.0461 26,9622 
30,7500 51.4100 


345 6. stl ind 1 2 


+2588 +2430 
4161 6133 
5095 3382 
«3382 8358 
3322 .5679 
©3385 7011 
- 3138 05246 


all « prm all » all ,y 


+3155 
1.9000 
+5392 
6204 
5979 
6170 


prm 7 all , 


-.4313 
2.3200 
-.1259 
“1.0758 
= 12:27 
~.1044 
-.1089 


«4739 
1,9000 
»8766 


4824 3536 
»5392 6204" 
1.0000 +5183 
+5183 1,0000 
5016 6694 
5601 9059 


7 all, std den , prm 


62275 =.2109 
-.1259 =1, 0758 
Le2229 -.9266 
-, 9266 3, 2660 
-,3998 -.2749 
-.2816 ~.3690 
-.3418 -,3299 
«8313 
8766 
1.0000 


17.4057 
22,8742 
15.4468 
25,8375 
24,5381 
25,3264 
48,1700 


18,0876 
23,8069 
16,0461 
26,9622 
25,3268 
26.2882 
50.9600 


3. stl dap 4 56, 


12267 
5096 
03322 
5679 
8611 
67204 
©4915 


+3250 
5079 
5016 
6694 
1,9090 
9170 


dan , den 


«2089 
el 227 
3998 
2749 
«4678 
«2450 
Goa0S> 


«2346 
5648 
3385 
+7011 
7204 
7167 
-5108 


3686 
6170 
-5601 
» 9059 
9170 
1.0000 


. 


~2078 
» 1044 
2816 
3690 
2450 
eDTLS 
24119 


34.9500 
45.5000 
30,7500 
51,8100 
48.1700 
50,9690 
98,0000 


=, 5566 
~, 4643 
=, 50918 
-.5246 
=o hOU5 
-.5108 

-0102 


2098 
1089 
3418 
23299 
3535 
4119 
+3865 


swp cov 1. prm 71 dep , 
WKMT 
GA GIs) ~,6312 
OS) 2 1.7699 
OTe ~. 4301 
-. 3484 -.4013 
-.3628 -.4152 
$ 
swp 2). prm 712 den, 7 
WKM1 3 
6 32:22 -.5008 
-.5008 1.9656 
2873 -.4311 
= 2267. -, 2133 
-.2316 =. 225.9 
Sy ond -,.2176 
$ 
swno 1. orn 72 dep , 7 2 
WKM1 
19K6 -.3972 
-.3972 8554 
-, 2810 ~.52h6 
-.2892 -, 4359 
-.2865 ~. 4831 
$ 
swp 3 prm 723 dep , 7 
WKMI ¢ 
+2556 -.2509 
-,2509 1, 2060 
-.4109 -.9850 
= 2317 -. 4065 
=. 2272 -.2873 
-.2300 -.3477 
5 
swp 1. prm 7 ind ,"dep , 
WKML os 
3358 -.4313 
~.4313 2.3200 
=2275 -.1259 
-.2109 -1.0758 
-.2089 7 1227 
-.2078 -.1044 
-.2098 -,1089 
$ 
swp 2. prm 713 dep, 7 
WKMI 
22932 -.4550 
~.4550 2.3069 
-.3847 -1,1720 
-,2839 = L6N2 
“#2606 -, 1336 
12739 ~.1hhh 


7 1 dep , 


~3712 
74301 
CTSTS 
©4704 
6002 


1 2 dep 


= 2073 
“4341 

» 9500 
4778 
-.3863 
=, 4354 


dep , 


2810 
«5 2N6 
5141 
» 39006 
~4OK9 


23 dep , 


-.4109 
-.9850 
2.7671 
~,3318 
~ 4174 
-.3804 


7 tnd dep 


Se2i24.5 
pares) 
1,.2129 
-. 9266 
-.3998 
-.2816 
-. 3418 


13 dep , 


-.3847 
“1,507.20 
2.5581 
-.5803 
~.5842 
-.5910 


3484 
»4013 
«4704 
«7701 
6263 


12267 
3s 2133 
4778 
©4910 
2761 
+3812 


2892 
04359 
3006 
6390 
oh742 


2317 
ohO6S 
3318 
4ThS 
+2505 
»3592 


-.2109 
-1.0758 
=. 9266 
3.2660 
-.2749 
-.3690 
-.3299 


+2839 
©1642 
5803 
5996 
©3379 
4662 


3628 
©4152 
6002 
6263 
6193 


e2316 
+2259 
3863 
2761 
+6130 
4492 


«2865 
4831 
4049 
4742 
4US9 


+2272 
2873 
e417h 
22505 
-5760 
+4168 


+2089 
+1227 
3998 
2749 
«4678 
«2450 
33535) 


. 2606 
1336 
35842 
3379 
6367 
4913 


2S 
.2176 
74354 
3812 
74492 
4198 


2300 
3477 
3804 
3592 
+4168 
3916 


-2078 
 1LOK4 
2816 
-3690 
+2450 
+5713 
4119 


2739 
Lah 
-5910 
~ 4662 
~4913 
~4828 


. 2098 
+1089 
«3418 
3299 
3535 
©4119 
+3865 


2 


Bis 
swp 1. prm 7 3 dep , 73 dep, 


WKMI 
2034 ~, 6158 3163 2869 3024 
-,6158 1,9627 6638 ~6521 » 6644 
~. 3163 -. 6638 - 6113 3474 4764 
-.2869 -,6521 347k 6444 »4997 
=, 3024 -. 6644 4764 74997 «4918 
$ 
macro std cov all . prm all , all. swp cov tnd »prm 7 all, 7 all, 
std dep . prm dep , dep . reser ym 
regr 
WKMI1 
1,0000 adiS5 04824 3536 cong 3686 
3155 1.0000 : »5392 ~ 6204 .5079 racy dls) 
4824 5392 1.0000 »5183 «5016 ~5601 
3536 6204 +5183 1.0000 ~ 6694 9059 
oo250 ~5079 ~5016 6694 1.0900 9170 
3686 6170 ~5601 »9059 9170 1.0000 
$ 
WKMT 3 
3358 -.4313 *.2275 -.2109 ~2089 ~2078 2098 
-.4313 2.3200 =, 1259 -1,0758 ehaed 71044 .1089 
=, 2005 -.1259 1.2129 =. 9266 .3998 2816 »3418 
-,2109 =1,,.0758 -. 9266 3.2660 ~2749 v5 620 23299 
-.2089 “1227 -.3998 ~,2749 ~4678 «2450 29535 
-.2078 -. 1044 ~,. 2816 -.3690 2650 obT13 ©4119 
-.2098 -,1089 -.3418 -.3299 03535 ©4119 »3865 
$ 
WKM1 3 
1.9000 ©4739 8313 
©4739 1.0000 8766 
8313 8766 1.0000 
quit. 
UNIC : 
WKPL 
WKMI1 
REGBR 
cov 
NFP 
IND 
ALL 
MATA 
CRPRON 


$ 
MEMORY BOUND 56533 
TO REUSF PRIVATF DATA AND DEFINITIONS, 
START 
OR 
SAVE  YRNAME 
RESUME  YRNAME 
RETURNING TO CTSS, 
R 40.716+14,550 


logout 

W 1040.8 

£0011 9900 LOGGEN OUT 10/17/66 1040.9 FROM 20000, 
TOTAL TIME USEN= 1,2 MIN, 





